Fix/ollama setup and stability#125
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis pull request adds Ollama as a supported LLM provider alongside OpenAI. Infrastructure is updated with named Docker volumes and healthchecks; configuration and initialization support Ollama setup with embedder model configuration; the LLM client layer gains async health checks and provider-specific logic; and the health API and UI are extended to report embedder status. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HealthRoute
participant LLMClient
participant Ollama as Ollama/OpenAI
Client->>HealthRoute: GET /health
HealthRoute->>LLMClient: health_check()
alt provider == "ollama"
LLMClient->>Ollama: GET /api/tags (aiohttp)
Ollama-->>LLMClient: models list
LLMClient->>Ollama: GET /api/pull (embedder check)
Ollama-->>LLMClient: embedder status
LLMClient-->>HealthRoute: {status, embedder_model, embedder_status}
else provider == "openai"
LLMClient-->>HealthRoute: {status, based on api_key}
end
rect rgb(240, 248, 255)
Note over HealthRoute: Combine LLM + embedder status
HealthRoute->>HealthRoute: audioai_overall_healthy = is_llm_healthy AND is_embedder_healthy
end
HealthRoute-->>Client: {audioai: {healthy, model, embedder_model, embedder_status}, ...}
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes span multiple layers (initialization, LLM client, memory config, routes, UI) with provider-conditional logic scattered across files. The async/await conversion in llm_client.py requires careful review of executor removal and aiohttp integration. While many changes follow similar "if ollama else" patterns (reducing heterogeneity), the LLM client's async health check implementation and multi-provider HTTP handling represent moderate logic density that demands careful scrutiny. Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backends/advanced/src/advanced_omi_backend/memory/memory_service.py (1)
426-434: Avoid storing user_email in vector store metadataEmail is PII and unnecessary for retrieval. Store user_id only or hash the email if needed.
Gate via an env flag (e.g., STORE_USER_EMAIL_IN_METADATA=false) or replace with a hash.
Also applies to: 607-615
backends/advanced/src/advanced_omi_backend/llm_client.py (1)
32-35: Abstract method signature mismatch (sync vs async)
LLMClient.health_checkis sync, but impl is async. Unify the contract.Apply:
- @abstractmethod - def health_check(self) -> Dict: + @abstractmethod + async def health_check(self) -> Dict: """Check if the LLM service is available and healthy.""" passAnd update any references accordingly (factory/users are already awaiting it).
Also applies to: 99-109
🧹 Nitpick comments (10)
backends/advanced/src/advanced_omi_backend/memory/providers/llm_providers.py (2)
242-250: Ollama connection check should use explicit timeout and avoid env-coupling inside provider
- Pass a finite timeout to httpx to avoid hangs.
- Avoid relying on global LLM_PROVIDER; base behavior should be driven by instance config.
Apply:
- if os.getenv("LLM_PROVIDER", "openai").lower() == "ollama": - import httpx - async with httpx.AsyncClient() as client: - # For Ollama, test connection by hitting the /v1/models endpoint - response = await client.get(f"{self.base_url}/models") + if "ollama" in (os.getenv("LLM_PROVIDER", "openai").lower()): + import httpx + async with httpx.AsyncClient(timeout=httpx.Timeout(5.0)) as client: + # For Ollama's OpenAI‑compatible API: + response = await client.get(f"{self.base_url.rstrip('/')}/models") response.raise_for_status() return TruePlease confirm your Ollama is using the OpenAI‑compatible base_url (ending with /v1). If native API is desired here, switch to GET {ollama_root}/api/tags.
186-201: response_format='json_object' may not be supported by all providers/modelsSome OpenAI‑compatible backends (incl. Ollama) ignore or reject response_format. Consider gating this by provider/model or falling back gracefully on error.
Add a quick run against your configured Ollama model to verify JSON‑object support in chat.completions.
Also applies to: 193-195
backends/advanced/src/advanced_omi_backend/memory/config.py (1)
151-154: Provider allowlist duplicated as literalsUse the Enum values to avoid drift.
- if llm_provider not in ["openai", "ollama"]: + if llm_provider not in {p.value for p in LLMProvider}:backends/advanced/src/advanced_omi_backend/memory/memory_service.py (1)
72-75: Provider class naming vs behaviorYou instantiate OpenAIProvider for both OPENAI and OLLAMA. Works with OpenAI‑compatible endpoints, but is confusing.
Rename to OpenAICompatibleProvider or route to a thin OllamaProvider alias for clarity.
backends/advanced/webui/src/pages/System.tsx (1)
7-18: Remove unnecessary any casts by typing service entryYou already declared optional fields; casting to any is avoidable and hides errors.
Define a ServiceStatus interface and use it:
-interface HealthData { +interface ServiceStatus { + healthy: boolean + message?: string + status?: string + base_url?: string + model?: string + embedder_model?: string + embedder_status?: string + provider?: string + worker_count?: number + active_workers?: number + idle_workers?: number +} +interface HealthData { status: 'healthy' | 'partial' | 'unhealthy' - services: Record<string, { - healthy: boolean - message?: string - status?: string; - base_url?: string; - model?: string; - embedder_model?: string; - embedder_status?: string; - provider?: string; - }> + services: Record<string, ServiceStatus> }Then replace
(status as any)usages with typedstatus.Also applies to: 275-290
backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py (2)
195-206: AudioAI health: consider 'partial' when embedder missingTreating “Embedder Model Not Configured” as healthy can mask degraded capability.
- Keep
healthyboolean, but setstatusmessage to reflect partial state when embedder is missing/unhealthy.- Optionally map overall system
statusto 'degraded' if LLM is OK but embedder is not, when MEMORY_PROVIDER=friend_lite.Confirm whether embedder is required for your current flows; adjust semantics accordingly.
Also applies to: 207-216
90-115: Health response exposes internal URIs
mongodb_uriandllm_base_urlleak infra details. If/healthis ever unauthenticated, redact.Mask hosts or omit credentials/components in the JSON payload for non‑admin callers.
backends/advanced/init.py (1)
371-399: .env CORS origins differ from docker-compose CORS_ORIGINSCompose sets a broader CORS list (3010, https). The generated .env shows 3000/5173 only and may confuse admins.
Align defaults or add a note that docker-compose overrides CORS via environment.
backends/advanced/src/advanced_omi_backend/llm_client.py (2)
110-156: Also verify Ollama LLM model availabilityYou check server and embedder; also check the default LLM model exists to avoid false “Connected”.
Apply:
# Check embedder model availability embedder_model_name = os.getenv("OLLAMA_EMBEDDER_MODEL") if embedder_model_name: try: # Use /api/show to check if model exists async with session.post(f"{ollama_health_url}/api/show", json={"name": embedder_model_name}, timeout=aiohttp.ClientTimeout(total=5)) as embedder_response: if embedder_response.status == 200: response_data["embedder_status"] = "✅ Available" else: response_data["embedder_status"] = "⚠️ Embedder Model Unhealthy" except aiohttp.ClientError: response_data["embedder_status"] = "❌ Embedder Model Connection Failed" except asyncio.TimeoutError: response_data["embedder_status"] = "❌ Embedder Model Timeout" else: response_data["embedder_status"] = "⚠️ Embedder Model Not Configured" + # Verify chat model presence too + try: + async with session.post(f"{ollama_health_url}/api/show", json={"name": self.model}, timeout=aiohttp.ClientTimeout(total=5)) as model_resp: + if response_data["status"].startswith("✅") and model_resp.status != 200: + response_data["status"] = "⚠️ Connected, model not pulled" + except (aiohttp.ClientError, asyncio.TimeoutError): + response_data["status"] = "❌ Ollama Model Check Failed"
157-172: OpenAI path returns “Connected” without a network checkThis can mask invalid keys/base URLs. Consider a lightweight call (e.g., list models) with short timeout, or coordinate with MemoryService connection test for consistency.
If you prefer to keep this cheap, reflect config-only status explicitly (e.g., “Configured (not verified)”).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backends/advanced/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
backends/advanced/docker-compose.yml(5 hunks)backends/advanced/init.py(2 hunks)backends/advanced/src/advanced_omi_backend/llm_client.py(4 hunks)backends/advanced/src/advanced_omi_backend/memory/config.py(4 hunks)backends/advanced/src/advanced_omi_backend/memory/memory_service.py(1 hunks)backends/advanced/src/advanced_omi_backend/memory/providers/llm_providers.py(2 hunks)backends/advanced/src/advanced_omi_backend/routers/modules/health_routes.py(3 hunks)backends/advanced/webui/src/pages/System.tsx(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.0)
backends/advanced/src/advanced_omi_backend/memory/config.py
153-153: Avoid specifying long messages outside the exception class
(TRY003)
195-195: Avoid specifying long messages outside the exception class
(TRY003)
199-199: Avoid specifying long messages outside the exception class
(TRY003)
202-202: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (1)
backends/advanced/init.py (1)
199-202: Good: auto-append /v1 for Ollama base URLPrevents common misconfig. LGTM.
| - mongo_data:/data/db | ||
| healthcheck: | ||
| test: ["CMD", "mongosh", "--eval", "db.runCommand('ping')"] | ||
| interval: 10s | ||
| timeout: 5s | ||
| retries: 5 | ||
| start_period: 10s | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
MongoDB healthcheck command is invalid; will never become healthy
db.runCommand('ping') is not valid; use { ping: 1 }. If this never passes, services depending on service_healthy will stall.
Apply:
- healthcheck:
- test: ["CMD", "mongosh", "--eval", "db.runCommand('ping')"]
- interval: 10s
- timeout: 5s
- retries: 5
- start_period: 10s
+ healthcheck:
+ test: ["CMD", "mongosh", "--quiet", "--eval", "db.adminCommand({ ping: 1 })"]
+ interval: 10s
+ timeout: 5s
+ retries: 5
+ start_period: 10sIf the mongo:8.0.14 image lacks mongosh, switch to mongodb/mongodb-community-server or install the shell.
I'll verify the MongoDB healthcheck syntax claims by searching for the correct mongosh documentation and container image details.
🌐 Web query:
mongosh db.runCommand ping syntax healthcheck
💡 Result:
Short answer — use db.runCommand({ ping: 1 }) and check the returned ok field. Example usages:
-
In mongosh interactive:
db.runCommand({ ping: 1 })
// returns a document with ok: 1 on success. [1][2] -
As a noninteractive healthcheck (Docker / scripts) using mongosh --quiet:
echo 'db.runCommand({ping:1}).ok' | mongosh localhost:27017/mydb --quietreturns "1" when healthy. Use localhost or the container host name as appropriate. [2][3]
Notes:
- The numeric value (1) is arbitrary and does not change behavior. [1]
- Use mongosh (new shell) rather than the legacy mongo in modern images. [3]
Sources: MongoDB ping command and db.runCommand docs; examples from community Docker health-check threads. [1][2][3]
🌐 Web query:
mongo 8.0.14 docker image mongosh included
💡 Result:
Short answer: Yes — the official MongoDB Docker images for the 8.0 line include mongosh (the modern shell). See MongoDB docs/listing of bundled tools (mongosh listed for the community/enterprise images) and the 8.0 release notes. [1][2][3]
Sources:
- MongoDB community/enterprise image tools listing (shows mongosh included). [1]
- MongoDB Docker install docs (notes & tooling / recommends mongosh). [2]
- MongoDB 8.0 changelog (8.0.14 is in the 8.0 series). [3]
If you want I can show how to verify inside a container (docker run + which mongosh / mongosh --version).
MongoDB healthcheck command is invalid; will never become healthy
db.runCommand('ping') is not valid syntax; must be an object like { ping: 1 }. Since the health check will fail, services depending on service_healthy will stall.
Apply:
- healthcheck:
- test: ["CMD", "mongosh", "--eval", "db.runCommand('ping')"]
+ healthcheck:
+ test: ["CMD", "mongosh", "--quiet", "--eval", "db.adminCommand({ ping: 1 })"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - mongo_data:/data/db | |
| healthcheck: | |
| test: ["CMD", "mongosh", "--eval", "db.runCommand('ping')"] | |
| interval: 10s | |
| timeout: 5s | |
| retries: 5 | |
| start_period: 10s | |
| - mongo_data:/data/db | |
| healthcheck: | |
| test: ["CMD", "mongosh", "--quiet", "--eval", "db.adminCommand({ ping: 1 })"] | |
| interval: 10s | |
| timeout: 5s | |
| retries: 5 | |
| start_period: 10s |
🤖 Prompt for AI Agents
In backends/advanced/docker-compose.yml around lines 154 to 161 the MongoDB
healthcheck uses invalid JS syntax db.runCommand('ping'); update the healthcheck
command to use an object form such as db.runCommand({ ping: 1 }) (e.g. change
the test to run mongosh --eval "db.runCommand({ ping: 1 })"), keep the existing
interval/timeout/retries/start_period values unchanged so the healthcheck can
succeed and dependent services won't stall.
There was a problem hiding this comment.
I have made the changes
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the MongoDB healthcheck configuration in docker-compose.yml
grep -A 6 "mongo:" backends/advanced/docker-compose.yml | grep -A 5 "healthcheck:"
Length of output: 220
@RKRitik Perfect! The MongoDB healthcheck is now correctly configured with db.adminCommand({ ping: 1 }) and the --quiet flag. This will properly report health status and prevent dependent services from stalling. Great work! 🎉
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
| embedding_dims = 768 # For nomic-embed-text | ||
|
|
There was a problem hiding this comment.
Hardcoded Ollama embedding dims (768) will break with non‑nomic models
Vector DB dims must match the embedder. If users set a different embedder (e.g., 384/1024), Qdrant ops will fail.
Apply:
- embedding_dims = 768 # For nomic-embed-text
+ # Default for nomic-embed-text; allow override for other embedders
+ embedding_dims = int(os.getenv("OLLAMA_EMBEDDING_DIMS", "768"))Also log the final dims to aid debugging.
Consider probing dims once (generate 1 embedding) and create the collection accordingly, or read from config.
🤖 Prompt for AI Agents
In backends/advanced/src/advanced_omi_backend/memory/config.py around lines
211-212, the embedding_dims value is hardcoded to 768 which will break when
users configure a non-nomic embedder; replace the literal with a derived value
by either (a) reading dims from configuration if provided, or (b) probing the
configured embedder once (generate a single embedding and use its length) before
creating the Qdrant collection; then use that derived dims everywhere instead of
the constant and add a process/app log entry that records the final
embedding_dims value for debugging.
|
Hey, thanks a lot for this PR |
|
Also look into addressing coderabbit's comments please. Thanks! |
|
To give some more context, the "memory service" would use some type of LLM service |
This commit introduces support for using Ollama as an LLM provider and includes several fixes to improve the stability of the setup process. Key changes: - Added Ollama as a selectable LLM provider in the setup script. - Fixed the backend code to correctly handle the Ollama provider. - Improved the docker-compose setup with healthchecks and named volumes to prevent race conditions and permission issues with MongoDB and Caddy. - Updated the setup script to automatically configure the Ollama URL correctly.
Adjusted Ollama health check logic in llm_client.py and llm_providers.py to correctly interact with Ollama's native and OpenAI-compatible APIs, resolving 'LLM provider connection failed' errors.
Introduced a prompt for OLLAMA_EMBEDDER_MODEL in backends/advanced/init.py, defaulting to 'nomic-embed-text:latest', and included it in the generated .env file. This ensures proper configuration for Ollama embedding models.
…ration Removed fallback defaults for OLLAMA_MODEL and OLLAMA_EMBEDDER_MODEL in memory/config.py. These environment variables are now strictly required, and a ValueError will be raised if they are not set.
- Backend (llm_client.py, health_routes.py): Extended health check to include Ollama embedder model status and updated overall health logic. - Frontend (System.tsx): Updated UI to display main Ollama model name and embedder model status in a consistent format.
Cleaned up various comments like '# Added', '# Added for context', '// New field', etc., to improve code readability and adhere to best practices.
Adjusted the health check logic in health_routes.py to correctly determine the 'audioai' service health when the LLM provider is OpenAI. Previously, the embedder status was incorrectly factored into the overall health for OpenAI, leading to an 'unhealthy' status. This change ensures that for OpenAI, the embedder status is considered 'Not Applicable' and does not negatively impact the overall health.
Introduced a prompt for OLLAMA_EMBEDDER_MODEL in backends/advanced/init.py, defaulting to 'nomic-embed-text:latest', and included it in the generated .env file. This ensures proper configuration for Ollama embedding models.
Cleaned up various comments like '# Added', '# Added for context', '// New field', etc., to improve code readability and adhere to best practices.
fix: MongoDB healthcheck command
89afccd to
69fa704
Compare
|
Thanks! |
This commit introduces support for using Ollama as an LLM provider and includes several fixes to improve the stability of the setup process.
Key changes:
Added Ollama as a selectable LLM provider in the setup script.
Fixed the backend code to correctly handle the Ollama provider.
Improved the docker-compose setup with healthchecks and named volumes to prevent race conditions and permission issues with MongoDB and Caddy.
Updated the setup script to automatically configure the Ollama URL correctly.
Adjusted Ollama health check logic in llm_client.py and llm_providers.py to correctly interact with Ollama's native and OpenAI-compatible APIs, resolving 'LLM provider connection failed' errors.
Introduced a prompt for OLLAMA_EMBEDDER_MODEL in backends/advanced/init.py, │
│ defaulting to 'nomic-embed-text:latest', and included it in the generated .env │
│ file. This ensures proper configuration for Ollama embedding models."
Extended health check to include Ollama embedder model status and updated overall health logic.
Ollama embedder online case
Ollama embedder offline case
Summary by CodeRabbit
Release Notes
New Features
Improvements